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l. INTRODUCTION 


A. PURPOSE AND SUMMARY OF RESULTS 

The purpose of this study is to develop the logistic regression alternative for 
estimating attrition rates using length of service and grade as carrier variables. It would 
be most useful if the regression coefficients showed temporal stability and were not 
highly dependent upon the occupational specialty. It 1s hoped that this development 
can enhance previously developed understanding of the attrition process as it affects 
the United States Marine Corps officer manpower data. 

Unfortunately the logistic regression approach to this problem does not improve 
upon estimators developed by earlier workers. See Table 8 on page 30. It does, 
however, contribute to the understanding of the attrition process as it relates to length 
of service and grade. The partial regression coefficients can serve in ad hoc calculations 
to indicate the direction of change and to make rough estimates of the amount of 
change. These coefficients do, however, change in more than small ways as one cha 
changes the military occupational specialty. See Table 7 on page 24. The aviation 
community especially appears to possess coefficients quite different from those of other 


conimunities. 


B. BACKGROUND 

The first step in any manpower planning should be a good description of the 
system or organization. Such can allow us to get reasonable forecast values. Forecasts 
should never be interpreted as what will happen but as central estimates of what could 
happen if the assumed trends continue. They therefore provide a guide for management 
action required to achieve a desired objective. Also, good forecast values depend upon 
finding efficient ways to estimate attrition rates. In other words the description of the 
the svstem, attrition rates and forecasting are each dependent on one another. 

The forecasts made by manpower planning models are affected by three general 
factors; existing inventory, projected losses and projected gains. {[n order to project the 
inventory into various future time periods it is necessary to forecast the future values 
using a realistic svstem of flow rates. 

Estimation techniques for the USMC officer attrition rates have been developed 


bv Major D.D.Tucker in a thesis [Ref. 1] submitted at the Naval Postgraduate School 


in September 1985, and further by Major John R. Robinson in a thesis [Ref. 2] 
submitted at the Naval Postgraduate School in March 1986. They used James-Stein 
and other shrinkage type parameter estimator schemes for the purpose of generating 
stable manpower loss rates. The reader is referred to Tucker [Ref. 1] and Robinson 
[Ref. 2] for most of the background information and the data structure used. By 
necessitv, some of that information will be repeated in this paper. 

The United States Marine Corps has about 20,000 officers. These can be cross 
classified into 40 military occupational specialties (MOS), 31 length of service (LOS) 
cells and 10 grades; hence 12400 categories for manpower planning purposes. Also 
about half of these categories are unoccupied for structural reasons. These structural 
zero categories will be described in chapter II]. The officer attrition and promotion 
structure was described by Tucker [Ref. 1]. 

One goal of this paper is to examine whether the logistic regression model is an 
efficient wav to estimate the attrition rates (i.e. the rate of leaving the service, not of 
changes in \{OS, LOS or Grade) for the officer MOS,/LOS, Grade categories. This 
problem is difficult because of the large number of cells with the low inventory. Tucker 
[Ref. 1] and Robinson [Ref. 2] collected the cells into major groups or aggregates to 
treat this small cell problem; attempts were made to aggregate cells that were believed 
to have common statistical behavior. In the present work we will not collect the cells 
into major groups. Every MOS will be taken individually. The structural zero cells will 
be dropped before applying the fitting procedure. Namely, structural zero cells will not 
be included in the regression equations. 

There are seven years data available for the present study. The first four vears 
(from 1977 to 1980) will be used for model development and logistic regression fitting; 


the last three years (from 1981 to 1983) for validation. 


C. ORGANIZATION 

Chapter II contains the details of the methodology and notation used in the 
present work. A brief summary of the generalized linear regression model is presented 
miethis Chapter. 

Chapter III explains the logistic regression model structure for the Marine Corps 
data and the validation procedure. A numerical example will be given to illustrate the 
fitting and validation procedures. Also, in this chapter we will compare Figures of merit 
with Robinson's [Ref. 2] results. | 


Chapter IV thoroughly discusses the results and recommendations. 


Appendix A includes the APL functions for the data manipulation, the logistic 
regression and the validation of the model. 
Appendix B illustrates the logistic probability plots of residuals and the plots of 


the residuals vs. fitted values for selected cases. 


Il. METHOD OF ESTIMATION 


A. INTRODUCTION 

A major use of regression models is prediction. Thus, given data on a response 
variable y and associated predictor variables x, (1 = 1 to p), the aim of the regression is 
to find a function of the x.’s which is, in some sense a good predicator of y. It is 
assumed throughout that the X.'S at which future predictions are required are not 
specified in advance but will occur randomly over some population of values and that 
the success of prediction can be judged by its performance over such a population. 

Logistic regression is a member of the class of generalized linear models. An 
overview of the linear model is briefly discussed in the following section. All of the 
approach and background for the logistic regression model was taken from Pregibon’s 


[Ref. 3] paper. 


B. AN OVERVIEW OF THE LINEAR REGRESSION MODEL 
Linear regression is used to relate a response variable y, to one or several 


~ 


explanatorv or descriptive variables Xi through a set of linear equations of the form 


i = By a = Baxi a5 o 1 = Hae ni 


The y, (fori = 1 to n) are the n observed values of the response variable, the Xi (for 1 
= 1 to n) are the n values of the } th explanatory variable (for } = 1 to p), and the 
paramieters B, are the unknown regression coefficients. The € are the random “errors” 
or fluctuations. The variables Xi and y. are sometimes called “independent” and 
“dependent” variables. 

The linear equation above can be simplified by defining an extra variable x; 


O 
whose value 1s always | (Xi = 1), so the model with constant term can be written as, 


yar YPN 1 &. i= l,...,n 


Usually the € are assumed to be statistically independent of each other with Zero 
means and with a constant variance that does not depend oni or Xi 
In regression we usually want to estimate the regression coefficients from the 


data, either because we want to know and interpret the coefficients themselves, or 


Li 


because we will use them to predict future values of y.. Upon replacing B. bv their 


~~ 
estimated values B., we obtain the fitted (or “predicted”) values Vis 


pa 
5 — ee tie 
epee 


The residuals é. are defined as the differences between the observed and the fitted 


values. 


The residual are used in many diagnostic displays because they contain most of 
the information regarding lack of fit of the model to the data. In terms of fitted and 


residuals, we have 
data = fit + residual 


which in mathematical notation is expressed as 
A 
y. = wi Bx. + € = 1 en 
“i T=0 1) 1 


In matrix notation the least-squares estimate J can be found as follows, 


p = e2 = jl y- XB? = (y - XP)" y - XB) 


“a 
where € is the vector of residuals , 7 is the square length of residuals and ¥ = Xf is 


the vector of fitted values. When we do some algebra, the equation becomes 
Q = aie 2y'XB ar BIX'XB 


A A 
If we take the derivative of @, subject to B and set the ¢@/cB equal to 0, then the least- 


A 

Squares estimate B is obtained by solving this normal equation 
yee ONE 20 

The solution of the linear svstem is 
R Ty c iva 
Diesel heey, 


which is sensitive to poorly fit observations and extreme design points. 
Presently, there is a fairly large battery of diagnotics available for detecting which 
As 
observations exert undue influence on B. The two basic quantities that are most useful 


“~ 
for this purpose are the residuals, é; = ee x;B. and the projection matrix 


1}? 


_ 


he= 12H = 12X(X'X) IX! 


where H is called hat matrix. Essentiallv, the vector € describes the deviation of the 
observed data from the fit, and M the subspace in which € lies 

As a bottom line, the residual vector € 1s important for the detection of ill-fitting 
points, but will not adequately point to observations which unduly influence the fit. In 
particular, large residuals are seldom associated with high-leverage points, whereas 
small residuals (which usually pass our inspection unnoticed) are typically of the 


opposite character. 


C. BACKGROUND AND NOTATION FOR THE LOGISTIC REGRESSION 
1. General 

A maximum likelihood fit of a regression model is extremely sensitive to 
outlying responses and extreme points in the design space. 

Classicallv, logistic regression models were fitted to data obtained under 
experimental conditions, for example, bioassay and related dose-response applications. 
The current use of logistic regression methods includes the analysis of data obtained in 
observational studies. In contrast to controlled experimentation, data from such 
studies can be notoriously “bad” both from the point of view of outlying responses (v), 
and from the point of view of extreme points in the design space (NX). The usual 
method of fitting logistic regression models, maximum likelihood, has good optimality 
properties in ideal settings, but is extremely sensitive to “bad” data of the above types. 

In particular, good data analysis for the logistic regression models need not be 
eXpensive or time consuming. 

2. Unstructured case 
Consider a single binomial response y ~ B(n,p). If we let 9 = logit(p) = 


log{p.(1 - p)}, the probability function of y can be written as 
fy; 0) = exp{y0 - a(0) + b(y)} a OV ox 0 


o 


8 
with a(8) = n log(l + e ), b(v) = log (9) and where throughout this paper log(.) = 
log.(.). Up to an arbitrary constant, the logarithm of f(y; 9), 


(9; y) = yO - a(8) + bly) 


is the loglikelihood function of 8. The score and information functions are given by, 


13 


0\(9; v) 


(0; y) = 50 = y-a(0) = y-np 
<7 Se), ) 
v(0; vy) = eee 4(9) = np(1 - p) 


where “a” with k dots above it denotes (0% / 60%)a(8). Standard results yield E{s(9; y)} 
= np = a(9) and Var(y) = np(l - p) = 4(@). Also, since (0: v) = 0 at the maximum 
likelinood estimate (m.l.e) of 6, we have § = a7 liy) = logit(v,n) as the m.le. of 9 
based on a single binomial observation y. 

Given a sample of N independent binomial responses y, ~ B(n.p.). The 
loglikelilhood function for the sample is the sum of individual loglikelihood 


contributions: 


ay 


MGs y) = FL Ws y¥) = ¥ (0; - a0) + by) 


= 


3. The logistic regression model 
The likelihood function (8; y) is over-specified. There are as many parameters 
as observations. Given a set of m explanatory variables (X).X55 pene Node the logistic 


regression model utilizies the relationship 
0 = logit(p) = XP 


as the description of the systematic component of the response y. In terms of the m 


dimensional paramater f}, we have the loglikelihood function, 
N N 
KX; B) = > _UxB; y.) a oy, xB : a(x.B) : b(y.) 


The m.le. maximizes the above equation and is a solution (assumed unique) to 
(0;0B) (XB; y) = 0. In particular, B satisfies the system of equations: 

N _ 4 
de Xi; - 4(x.B)) = 0 nee 


r= 
“A 
Writing s = y - 4(XB) = y - np, the formulation of the likelihood equations is 


X's = Xl(y-9) = 0 


where ¥ = np and T denotes the transpose. These equations, although very similar to 
their normal theorv counterparts, are nonlinear in 6 and iterative methods are required 
to solve them. Typically, when second derivatives are easy to compute (in the 
-( éByXT = X!v X with V = diagonal{a( x:B)}), the Newton-Raphson method is 


employed . This leads to the iterative scheme 
gtr! ae B! Ee (SVN) NG s 


where both V and s are evaluated at B'. At convergence (t = u). we take p = p" , and 
denote the fitted values n, Dp. by y,. The estimated variance of v. is v.. = n.p.(1 - p.). 

A most useful way to view the iterative process outlined above is by the 
method of iteratively reweighted least-squares (IRLS). Thus is obtained by emploving 


pseudo observation vector zt = xpt + V_!s, for which the above equation becomes 
Bt sae acme (x! VX) a ly Ty zt 


~ — e . 
At convergence. we have z = XB + V ls, Thus we may write the maximum 


likelihood estimator of B as 
6 = (XI vx)7!xTvz 


4. Output from a maximum likelihood fit 
Once the miodel has been fitted (that is, we have the mle. $B), various 
quantities from the fitting process are available for the data analysis. Typically, these 
quantities consist of subsets of the following: 


a 
1. the estimated parameter vector, B ; 


2. the individual coefficient standard errors, s.e(B.); 

3. the estimated covariance matrix of fi, var(p) = (X Ey ~ I 

4. the chi-squared goodness of fit statistic 7 =) :7/V,. 

3. the individual components of 4° namely ¥; = s:/V:. Sa gs n:p:)//n.p-(L ; p.); 
6. 


A A “A ° 
the deviance D = ee y) - (6 ; y)}, where I(@ ; y) refers to oe 
of the loglikelihood function based on fitting each point exactly, Le, 0. = 
logit(y./n, ). 

Asymptotic arguments suggest that the deviance and chi-squared statistics 
have the same linuting null 47(N - m) distribution, and hence some measure of the 


appropriateness of the fitted model. 


D. THE BASIC BUILDING BLOCKS OF REGRESSION DIAGNOSTICS 
1. Preliminaries 

After fitting a logistic regression model, and prior to drawing inferences from 
it, the natural succeeding step is that of critically assessing the fit. In practice however, 
this assessment is rarely considered and seldom carried out. The basic reasons are 

1. the lack of routine methods for performing such an analysis, and 
2. the presumably high cost of doing so. 

The role of a regression diagnostician 1s to provide routine methods of model 
sensitivity analysis which are both intuitively appealing and inexpensive. Clearly this 
requires a thorough understanding of the model and the nature of the fitting process. 

2. The basic building blocks 

For the logistic regression model, the basic building blocks for the 
identification of outlving influential points will again be the residual vector and a 
projection matrix. For the linear model. residuals are rather uniquely defined (apart 
from standardization), whereas for the logistic regression model, residuals can be 
defined on several (at least three) scales. The two most useful are the components of 


chi-square, given above in (e), and the components of deviance, D = yy di 
“eS (Once Reean lee 
d: oa Ne (Oi¥j) I( x: ’ va ’ 


~ a A “aA 
Where the plus or minus is used according as 0; > x,B or 0; < x.B. Note that d; is 


defined for all values of vy; even though 0: may not be. In particular, y = 0, 2 een 
log(1-B) and at v = n, d? = -2n log(B). Both 4" and D are the measures of@iie 


goodness-of-fit of the model. 
The analog of the projection matrix for the logistic model will also be denoted 


by M, which in its general form is given as 

M = 1-H = f- V!}2x(XTvx)7 !xTyl/2 
The usefulness of M arises as a consequence of the IRLS formulation described earlier. 
In particular, as B = (X' vx) !X! vz, the vector of pseudo-residuals is given by 

z= XP = {i - XT VN) Ke erie 


: D — ° : = —|]/) eee 
using the fact thatz = XB + V Ie. this can be written as V's =V |) 2MV_ ! 2s 
Premultiplication by the diagonal matrix V!* yields y = My, where x = Ve 


Thus, as in the linear model case, M is symmetric, idempotent and spans the residual 


(Z) space. This suggests that small m, which are the diagonal elements of the 
projection matrix M should be useful tn detecting extreme points in the design space. 

In most cases, the examination of ¥.. d, and m,, will call attention to outlying 
and influential points. In some cases, combinations of these (for example. studentized 
residuals) will also be useful. For displaying these quantities, index plots are generally 
(and, if the order of the observations 1s important strongly) suggested: that is, plots of 
%, vs i, d, vs tand m,, vs i. In particular cases, plots of these building blocks against the 


fitted values could prove useful. 
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Il. MODEL BUILDING WITH USMC MANPOWER DATA 


A. GENERAL 

Robinson [Ref. 2] explains the conversion of the raw data to an APL workspace. A 
brief explanation about the conversion 1s given in Appendix A. The summary data file 
classifies the Marine Corps officer inventory into 40 military occupational specialties, 
10 grade levels, 31 length of service and 8 loss categories. In the present study we are 
not dealing with the tvpe of loss. These were described by Tucker [Ref. 1] For use in 
our model we need to define grades and military occupational specialties (MOS) by 
Table 1 and 2. When reference is made to a particular grade or group of grades the 
code number from Table 1 used instead of the name of the grade. For example this 
project will refer to the grades first lieutenant, captain and major as numbers 5, 6 and 7 
respectively. Tucker and Robinson used data code numbers for the MOS instead of 
the actual MOS. For example, this project will refer to the Air traffic control MOS as 
number 37 not 73. It should also be understood that the two digit MOS identifier listed 
in Table 2 is strictly the nulitary occupational specialty identifier in the USMC MOS 
manual. We will also use the code number from Table 2 for the MOS. The column 


containing the letters A through E, refer to the structural zero categories. 


TABIEE Sl 
GRADES 


GRADE 


WARRANT OFFICER Glee 

CHIEF WARRANT OFFIC CWO-2 
CHIEF WARRANT OFFICER (CWO=3 
CHIEF WARRANT OFFICER (CWO-4 
SECOND LIBUTENANT 

FIRST LIEUTENANT 

CAPTAIN 


MAJOR 
LIEUTENANT COLONEL 
COLONEL 


0 
1 
Z 
3 
4 
> 
6 
7 
8 
9 





eee 
Wie OCCEVATIONAL SPECIALTIES (MOS) 


DATA 
cope Hos. CAL Mes LITLE 


UNKNOWN 

PERSONNEL AND ADMINISTRATION 
INTELLIGENCE 

INFANTARY 

LOC roa es 

PIELER PSR ITILEERY 

Ut eee a> 

BVNGMIEER, CONS tRUCTION ty FOUIPMENT 
DRAFTING, SURVEYING AND MAPING 
PRINTING AND REPRODUCTION 

TANK AND AMPHIBIAN TRACTOR 
ORDNANCE 

AMMUNITION AND EXPLOSIVE ORDNANCE 
DISPOSAL 


OPERATIONAL COMMUNICATIONS 
SIGNALS INTELLIGENCE/GROUND ELECTRONIC 
WARFARE 

DATA/COMMUNICATIONS MAINTENANCE 

SUPPLY ADMINISTRATION AND OPERATIONS 
TRANSPORTATION 

FOOD SERVICE 

AUDITING FINANCE AND ACCOUNTING 

MOTOR TRANSPORT 

DATA SYSTEMS” 

MARINE CORPS EXCHANGE 

PUBLIC AFFAIRS 

foe te CrP Vyicuc 

TRAINING AND AUDIOVISUAL SUPPORT 


BAND 

NUCLEAR Soa AND CHEMICAL 
MILITARY POLICE AND CORRECTIONS 
ELECTRONICS MR INTENANCE 


AIRCRAFT MAINTENANCE 

PN EIIO IN| IGS 

AVIATION ORDNANCE 

WEATHER SERVICE 

AUR? Pie bi oiky PCH S 

Soo en Atk SUPPORT ANDaeANTI-AIR 
WARFARE 

Pees Pa GeO PROT 

PIEOTS AND NAVAL FLICHT OFFICERS 
PReNiI EY ING MOS ATiBRBREPORETING MMOs 


WWW WWWWWWWHDNNNNNDNNNNHFPHEHBEHE HE HHHOOOOOOOOOO 
WO~YT NUPWHFOWO~IODVUIBPWNHOWONAIAM PW HNOKFOWO~ATHNUPWNHEHO 
OWN SIO OOHOOUMUIBA PPP BPWWWWWDN Dh NNHEHHEHOOOOOG 
OUIW NODUWKHOWODAIUIMNPWHROUBWHOO DW WrFOUDPWHOPWNHHYS 
HOY PUWOOPPrOPrUWYrLPYrwOYrYrLrrrrw PrP WPrQVUUPrUPrrOorr>r 


A structural zero is a cell whose inventory is always zero because certain grades 
and length of service combinations should never appear in that military occupational 
specialty (MOS). For example a Colonel with 5 vears of service in any MOS or an 
inventory warrant officer in MOS 03 does not exist. The effect of these structural zero 


categories is summarized in Table 3. 
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TABLE 3 
STRUCTURAL ZERGQES CATEGORIES 
SeErus 
Grades Number zeroes 
Category within MOS CE1O5 per MOS 


Pe Leer Zo Le, 
. CWO4, LDO 8 


9 

~ g 
ereon 3 2 
- CwOo4 5 ] 





B. ate. TO BUILD THE LOGISTIC REGRESSION MODEL WITH USMC 


1. Introduction 
The purpose of this studv is to develop the logistic regression model for 
estumating USMC officer attrition rates using length of service (LOS) and grade (GR) 
as carrier variables. The logistic regression model for the estimation of USMC officer 


attrition rates can be formulated 

§ = logit(p) = B oe B.(LOS) =e B,(GR) 
In matrix notation, this can be written as 

6 = XB 


where X is Nxm matrix, also called the design space and B is the mx! matrix, also 
called the coefficients of the regression. Then, it can be said that @ = logit(p) is a Nxl 
matrix. 
2. How to create the design space 

Each MOS 1s taken individualy for the estimation of officer attrition rates. 
Every MOS has dimension 31x10 for 31 LOS’s and 10 grades. Each LOS and grade 
must be broken into segments and each segment is a seperate regression. As an 
example, any MOS can be broken into four segments as in Table 4. Each segment has 
its own X matrix. Each design space (X) has dimension Nxm where N stand for the 
number of independent binomial responses and m stand for the number of explantory 


variables, which is always three in our case. This X matrix can be written 


20 


CNT LOS GR 


5) Eo) Ea 


X(Nx3) 


1X10 Xa OXNG 


minere © NI means constant which 1s the first column of the X and always one. 
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C. A NUMERICAL EXAMPLE FROM THE USMC DATA 

As an illustration of the standard output from a maximum likelihood fit and the 
use of the logistic regression model, we will use the case where military occupational 
specialty (MOS) = 20 (motor transport, from Table 2), length of service (LOS) = 
from 5 to 19 years and grades = 4,5,6,7 (second leutenant, first leutenant, captain 
and major, from Table 1). The data are listed in Table 5. They are obtained using the 
APL data manipulation functions described in detail in Appendix A. 

In Table 5, the structural zero inventory cells are dropped before applving the 
fitting procedure. The output listed in Table 6, is obtained using the APL logistic 


regression functions in Appendix A. We get the estimated coefficients of regression as 


follows, 


B, = 0.548539 
B, = -0.17092 
B, = -0.20117 
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The deviance for the fit, 46.5863 on 28 degrees of freedom, and the corresponding 
chi-squared statistic is 46.4579. Both are less than their asymptotic expectation of 28, 
indicating no gross inadequacies with the model. In table 6, %, is the individual 
component of x”, d. is the component of deviance and m,... is the diagonal element of of 
projection matrix M. The examination of x., d. and m., calls attention to outlying and 
influental points. The individual components of ¥* and of the deviance (d.) are plotted 
against the logistic probability plot in Figure 3.1. Evidently, two observations, the jou 
and 13" are not well fit by the model; their %, and deviance (residuals) deviate from the 


Straight line configuration of the others. Also, fitted values are plotted against the 
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components of the deviance and the components of the x~ in Figure 3.2. For 


M1 


displaying the combinations of Xs d. and m.., index plots (1.e. 4, Vs 1, d. VSulanad i 4S 


beare showed in Figure 3.3. 
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we selected some cases to exaniine whether the coefficients of regression 
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y or not. The estimated coefficients of regression are listed bv 
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Table 7 for the selected cases. 
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D. VALIDATION OF MODEL 

A validation test was conducted to evaluate the efficiency of the logistic 
regression model for the estimation of the USMC officer attrition rates. The test was 
conducted as follows: 


I. Select the LOS’s and grades within a nulitary occupational specialty. The 
resulting desired array will be three dimensional (vears, LOS, grades) 


Peele ete stand tor FOS, then 1 — 0;...,50 

ee Lectm) Stand for GRythen j = 9,...,9 

4. Let ai number of leavers in cell (i,j) 

Bae Let n= central inventory in (1.j) = max {(N(t)+ N(t+ 1))/2, Y(t )} 

Oa Lett — l...,) where Tf = number of years{i.e from 1977 to 1983) of data used 


to create the estimator 
The validation procedure used t = 1,...,4 (1.e. from 1977 to 1980) for the fitting 
andt = 5,6,7 (i.e. from 1981 to 1983) for validation. 
The following procedures were utilized to validate the effectiveness of the logistic 


regression estimation process. We define an indicator variable 


l Py = 0 or | 
De it 
0 Pp, * Oorl 


Then 
K = dD, for alli and j 


where K is the number of nonstructural zeroes cells. Then validation test can be 


formulated as chi-square goodness of statistic test as follows 


(P;; ie B,.)° 
Chi-square MOE = )Y Dh). <<a for all i and j 

p,,( 1 F Bi) 
Where Bi is found from the fitting using the estimator years. p, (= y/n) can be 
obtained from the validation and the central inventory which comes from the 
validation years. For our numerical example, (MOS = 3, LOS = 5 through [4 and 
GR = 4,5,6,7) we get the following validation test results for the years 1981, 1982 and 
1983 specifically MOE; are 52.6998, 36.4182 and 30.6585 respectively. 
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Figure 3.2 Plots of fitted values vs x, and d. 
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Figure 3.3 Index plots of x,, d. and m... 
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E. COMPARISON OF THE FIGURES OF MERIT 

Pietisssectionw\ cv dlleconipare the figures of merit with Major Robinson's 
(Ref. 2] results. As we mentioned before, he used the linuted translation shrinkage 
estimation (LTSE) for the estimation of USMC officer attrition rates. We have been 
using a different estimation method for the same manpower data. Also, he used 
procedure which we explained in the above section to validate the effectiveness of the 
linuted translation shrinkage estimaton. In order to compare the figures of merit of 
logistic regression and the shrinkage estimation, we present some results for some cases 
in Tables 8 and 9. 

If we look at the tables we can see that shrinkage estimation looks better than 
logistic regression estimation for most of the selected cases. We can’t say that limited 
translation shrinkage estimation is much better than logistic regression. The results are 
very close to each other for some cases, even though, logistic regression is sometimes 
meter than shrinkage estimation (1.e. for case MOS = 20, 3S LOSS9 and 
4=GRS6). 


TABLE 8 
FIGURES OF MERTIT 


(O<LOSS6) AND (4SGRS6) 


1981 1982 

MOS = 3 ( INFANTRY) 

LTSE 27.8528 42.4799 

REGRESSION 59.8577 88.5361 
MOS = 7 (ENGINEER, CONSTRUCTION AND EQUIPMENT) 

LTSE 13. 2892 18. 8664 

REGRESSION 35.3195 31.3636 
MOS = 13 (OPERATIONAL COMMUNICATIONS) 

LTSE 22. 4989 16.1496 

REGRESSION Aleg 272 31.5084 
MOS = 20 (MOTOR TRANSPORT) 

LTSE 15.9591 34.4740 

REGRESSION 24. 4329 28.3449 

(3<LOS<9) AND (4SGRS6) 
1981 1982 

MOS = 3 (INFANTRY) 

LTSE 19. 1602 67.2562 

REGRESSION 73.0644 89.0204 
MOS = 7 (ENGINEER, CONSTRUCTION AND EQUIPMENT) 

LTSE 20.5515 19. 8988 

REGRESSION 60. 5127 40. 1607 
MOS = 13 (OPERATIONAL COMMUNICATIONS) 

LTSE 20. 3665 15.3913 

REGRESSION 28. 6348 25.9982 
MOS = 20 (MOTOR TRANSPORT) 

LTSE 22.3545 52.2840 

REGRESSION 26.1725 31. 6402 
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TAB EE s 
PiGencs OF Veni? (CONT D.) 


(9SLOS<19) AND (5SGRS8) 


1981 1982 
MOS = 3 ( INFANTRY) 
LTSE 84.5388 FOuS422 
REGRESSION 149.5783 61.7802 
MOS = 7 (ENGINEER, CONSTRUCTION AND EQUIPMENT) 
LTSE OAD aa 22.9296 
REGRESSION 84. 4140 48.6112 
MOS = 13 (OPERATIONAL COMMUNICATIONS) 
LTSE 48.3150 25.9520 
REGRESSION 108. 1312 Eee au 
MOS = 20 (MOTOR TRANSPORT) 
LTSE 20.5629 24. 6164 
REGRESSION 41.8773 44.0796 
(19<LOSS29) AND (7S5GRS9) 
1981 1982 
MOS = 3 ( INFANTRY) 
LTSE 30. 0620 18.9604 
REGRESSION 46.3861 28.9819 
MOS = 7 (ENGINEER, CONSTRUCTION AND EQUIPMENT) 
LTSE 21. 8423 25.2194 
REGRESSION 28.3865 33.0140 
MOS = 13 (OPERATIONAL COMMUNICATIONS) 
LTSE 46.9617 20. 6439 
REGRESSION 77.5956 36. 2923 
MOS = 20 (MOTOR TRANSPORT) 
LTSE 12.5150 15.5716 
REGRESSION 23.2035 27. 9930 
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IV. CONCLUSIONS AND RECOMMENDATIONS 


A. CONCLUSIONS 


Recall that the logistic function and its inverse can be expressed as 
. 9 9 
9 = In {p/(l-p)} and p=e /(lt+e) 
Further, it is useful to record , 
9 9 
dp'd9 =e /(l+e/ 


Identifving p as the attrition rate, we can use a limited Taylor approximate the change 


in rates. Thus, 
Ap = p(1 -p){B,ALOS + B;AGR} 


provides us with a linear approximation to the direction and amount of change. 

Although the logistic regression approach does not improve upon the attrition 
rate estimators developed by Tucker [Ref. 1] and Robinson [Ref. 2] it does point to the 
direction of change as one varies LOS and GR. To this end, it was necessary to 
partition the 30 year LOS range into segments. It is an exercise in curiosity to 
speculate as to the reasons for observed behavior in these segments. Here is our 
offering 

1. QO S LOS §S 5; attrition rates are chaotic as young officers “test the waters’. 

2. 3 S LOS S 9; attrition rates decline with increasing LOS as officers commit 
themselves to longer second and third contracts. One would think that 
advancement in grade would also correlate with a lower rate, but we dont see 
that in Table 8 also there are other kinds of shifts influencing the attrition 
behavior in these years. 


3. 9 S LOS s& 19; the maturing carrier commitment has been made and rates 
decline with increasing LOS and GR 


4. 195 LOS S 30; since advancement opportunities of the senior officer are 
ae limited we see rates increasing with LOS and decreasing with advances in 


B. RECOMMENDATIONS 
The linear approximation to the effect of change could be most useful if we could 
group the MOS categories into sets of common regression coefficients and if these 


coefficients were stable over time. To pursue each of these contingencies requires 


additional work and an expanded data base. The programs developed in this thesis 
serve as a foundation for extension. 
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APPENDIX A 
APL FUNCTIONS 


LE GENERAL 

This appendix contains APL functions for the data manipulation, logistic 
regression and the validation of the model. The original data is on a magnetic tape 
named COUNTS prepared by Navy Personel Kesarch and Developnicnt@emcn 
(NPRDC). Robinson [Ref. 2] explained the conversion of raw data from tape to an 
APL workspace. In order to get the LOSSXX (Losses) and INVXX (Inventories) 
arrays, the procedure should be followed in the order presented by Robinson. “XX” is 


the applicable fiscal year. (e.g. 77 for fiscal year 1977) 


2, DATA MANIPULATION FUNCTIONS 

Some APL functions were developed by Tucker and Robinson for the data 
manipulation and exucution of calculations pertaining to the processes under 
evaluation. These functions will be summarized in the following section. We will use 
some of them in this project. They are GETINV, INVMATX, GE?TLOsSsaaad 
MATRIX. Also, two more APL functions were utilized for the manipulation of the 
data in order to use the logistic regression and validation. 

a. Creating the inventory and loss arrays 

Using the INVXX arrays and the APL function GETINV in Figure A.1 and 
INVMATA in Figure A.2 create the array INA. Note that GETINYV calls INV Mika 
and INVMATX uses the INVXX arrays. APL workspace size limitations may be a 
problem due to the large amount of data. It may be necessary to create one or two 
arrays at one time and copy them to another workspace. 

The LXX arrays are created in a manner similar to the above, using the APL 
functions GETLOSS in Figure A.3 and MATRIX in Figure A.4 APL function 
MATRIX uses the loss arrays LOSSNXX. The resulting matrices are “IXNX” and “LXX° 
for fiscal year “XX”. The function “INVMATX” and “MATRIX” could create a matrix 
of the following dimension 7x40x10x31 for 7 years, 40 MOS’s, 10 grades and 31 LOS’s. 
However, due to limited workspace, the dimension of 40x31x10 for 40 MOS’‘s 31 LOS’‘s 


and 10 Grades was commonly utilized. 


V GETINV 
a THIS FUNCTION CALLS THE FUNCTION INVMATX 
0 FOR EACH FISCAL YEAR. IXX IS TRE INVENTORY 
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Figure A.l APL Function GETINV. 
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Figure A.2. APL Function INVMATX. 


b. Manipulation of the data for regression and validation 
The function GETCENINV in Figure A.5 creates the central inventory which 
pecmedelsowetor the wiscal years [rom 1977 to 1983, The function GETCENINYV uses 
the global variables of “IXX” and “LXX” for the inventory and loss matrices 


respectively, for fiscal vear “XX”. 


V GETLOSS 
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Figure A.3 APE PilncuoniGEVEoss: 


V Z<MATRIX X3A3B:3C; 

a THIS FUNCTION chenrks ide ios ARRAY FOR THE FISCAL 
a YEARS USING THE ARRAY OF LOSS INDICES LOSSXX. IT IS 
a CALLED BY CGETLEOSS  bOo oe oer CHARACTER VECTOR 
aWITHYG DATA ENTRIES FOLLOWED BY 1 BLANK FOR EACH LOOP. 
aie 3110)p0 
l<oXxX 
J<(I+1 G20 
LOOP :>(C(J=0)/0UT 

A<o(14 

B<1+ 

C<1it 

D<1+ 

E 


Marsares rar arate aera ae aes 
NP BEE REPRE WOONOUEWNER 


J 
] 
] 
] 
] 
J 
] 
] 
] 
0 
1 
Z 
3 
Ly 
3) 
6 
7 
8 
Q 
0 


J 
] 
] 
] 
] 
] 
] 
] 
] 
i 
] 





Figure A.4 APL Function MATRIX. 


The function GETDATA in Figure A.6 manipulates the data for regression 
and validation procudures. The outputs; IEST and LEST are the sum of CIXX and 
LXX respectively where “XX” is the fiscal years 1977 to 1980, 1.e. the first 4 vears are 
used for the estimation. “IVALXX” and “LVALXX” are the CIXX and LAX 


respectively where “XX” here is the fiscal vears from V9S1l 1691933, Vex the last imee 
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Pictienw.o ALE fruncion GeEDATA. 


years are used for the validation procudure. The function GETDATA uses the global 
variables CINX and LXX for the central inventory matrix and loss matrix for fiscal 
Rear NX , 

c. Why the central inventory? 

A problem arises on several occasions when the data is disaggregated to a 
level for which the inventory is very small. For example, when examining the inventory 
in a particular fiscal year, the inventory can be Zero for a length of service (LOS) and 
military occupational specialty (MOS) combination. Examining the inventory in the 
next fiscal year for the same LOS and MOS combination may also be zero. The 


problem arises when the number of leavers is equal to or greater than one. 
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VY LOGISTIC 

a THIS IS THE MAIN FUNCTION FOR THE REGRESSION DIAGNOSTICS 
a AND THE VALIDATION. THIS FUNCTION CALLS THE FUNCTIONS 

a FITTED, RESIDUAL AND VALIDATION WHICH THEY ALL MUST BE 

a IN THE SAME APL WORKSPACE. 
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Figure A.7 Apl Function LOGISTIC. 


This can occurs because the inventory figures refer to the instant beginning of 
the fiscal year, and the loss figures refer to any time during the year. I.e. an officer can 
both access and attrite from it anv time during the vear. Then p (= y/n) would be 
ambigous where y is the leavers and n is the inventory at time t. 

For the purpose of removing this ambiguity from the data, the following 
policy was adopted to define the central inventory number for the officer force at 
disaggregated levels for any cells or collection of cells. 

| Lett — IeeGe refer tothe year oer 1982 
2. Let Y(t) = Number of losses in vear t 


3. Let INV(t) = Inventory in the beginning of year t 
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Piglire eo omeaplunetion FITTED. 


J, poe Ome eee of Y(t) and the average inventory, using the beginnin 
t and t+1 and computing their. aVarage (INV(t 
PR VIEH Ly} » X(t) is the ential inventory of year t. This will provide the 
elements for a more accurate estimation of ‘the attrition rate on the 
disaggregated level. 

3. LOGISTIC REGGRESSION AND VALIDATION FUNCTIONS 
The following APL functions were utilized for the logistic regression and the 
validation of the model. These functions must be in the same APL workspace. Also, 
Beevyeuisetie °lObal variables, FEST, LEST, [VAL8!, EVAL82, IVALS83, LVALSI, 


LVAL82 and VAL83 which are the output of the function GETDATA. 
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Figure A.9 Apl Function RESIDUAL. 


a. Function LOGISTIC 
APL function LOGISTIC in Figure A.7 is the main function for the regression 
and validation calculations. This function calls FITTED, RESIDUAL. angi 
VALIDATION functions. These functions cannot be run alone. They must be run by 
the function LOGISTIC. In other words, they are just the subfunctions of the main 
function LOGISTIC. These subfunctions will be discussed following. 
b. Function FITTED 
APL function FITTED in Figure A.8 finds the fitted values of the regression. 
This function uses global variables “IEST” and “LEST”. 
c. Function RESIDUAL 
APL function RESIDUAL in Figure A.9 calculates the array of the residuals. 
This function is just the continuation of the function FITTED. 
filesect Function VALIDATION 
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Figure A.10 Apl Function VALIDATION. 


APL function VALIDATION in Figure A.10 calculates the Chi-Square 
Statistics for the fiscal years from 1981 to 1983. This function uses global variables 
IVALXX and LVALXX where “XX” are the fiscal years from 1981 to 1983. 

d. Description of the output variables 

In this section, we will describe the output variables which are used in the 
APL functions. 

BETA — : vector of the regression coefficients 

TETHA : vector of logit(p) where p = v/n 

TETHAT : vector of fitted values 

DEY : vector of components of the deviance 


CHICOM : vector of individual components of 4° 


MD ; vector of diagonal elements of projection matrix 
CHI : the chi-squared goodness of fit statistic for estimation years 
D : total deviance 


4] 


CHISQ _ : the vector of chi-squared test statistic for validation years 
DEF : degrees of freedom 
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APPENDIX B 
GRAPHS 


This appendix contains graphical illustration of the fitting for the estimation of 


USMC officer attrition rates. Some cases were selected from the USMC manpower 


data to illustrate whether logistic regression model fit well the data or not. Each case 


has its own regression. From Figure B.1 through the Figure B.8, for each case, 


following plots are showed. 


l. 


tod bho 
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logistic probability plot of components of the deviance 
logistic probability plot of components of the chi-square 
scatter plot of fitted values vs components of the deviance 


scatter plot of fitted values vs components of the chi-square 
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Figure B.4 
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Figure B.5 
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